智能论文笔记

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

Harshit Joshi , José Cambronero , Sumit Gulwani , Vu Le , Ivan Radicek , Gust Verbruggen

分类：人工智能

2022-08-24

编写代码时，大多数程序员会犯错误。这些错误中的一些很小，几乎不需要对原始程序进行编辑 - 最近称为最后一个英里错误的错误。这些错误打破了经验丰富的开发人员的流程，并且可以使新手程序员陷入困境。针对此类错误的现有自动化维修技术是特定于域的，并且不容易延续到新域。转移符号方法需要实质性的工程和神经方法需要数据和重新培训。我们介绍RING，这是一种多语言维修引擎，该引擎由经过代码训练的大型语言模型（例如Codex）提供动力。这样的多语言引擎可以为编程援助提供一个翻转的模型，该模型与传统的代码建议技术相比，程序员编写代码和AI援助建议修复。从程序员手动修复错误的方式中汲取灵感，我们表明，基于迅速的策略将修复作为本地化，转换和候选排名概念化，可以成功地在多个域中成功维修程序，但努力最少。我们通过评估6个不同的域并将性能与域特异性维修引擎进行比较，为这种多语言维修引擎提供了第一个结果。我们表明，环可以超过这些域中3个域中的特定于域特异性修复引擎。我们还确定了使用LLMC进行多语言维修的未来研究方向。

translated by 谷歌翻译

CORNET: A neurosymbolic approach to learning conditional table formatting rules by example

Mukul Singh , José Cambronero , Sumit Gulwani , Vu Le , Carina Negreanu , Mohammad Raza , Gust Verbruggen

分类：人工智能

2022-08-11

电子表格广泛用于桌面操作和演示。这些表的风格格式是演示和分析的重要属性。结果，流行的电子表格软件（例如Excel）支持基于数据依赖性规则的自动格式表。不幸的是，编写这些格式规则对于用户来说可能是具有挑战性的，因为这需要了解基础规则语言和数据逻辑。在本文中，我们提出了Cornet，这是一种神经符号系统，该系统解决了从格式化细胞的用户示例中自动学习此类格式规则的新问题。 Cornet从归纳计划的合成中汲取灵感，并根据半监督聚类和迭代决策树学习结合了符号规则，并与神经排名者一起产生条件格式的规则。为了激励和评估我们的方法，我们从超过40k真实电子表格的语料库中提取了表格的表格。使用这些数据，我们将短号与各种符号和神经基线进行了比较。我们的结果表明，与这些基线相比，Cornet可以在不同条件下更准确地学习规则。除了从用户示例中学习规则外，我们还提出了两个案例研究，以激发Cornet的其他用途：简化用户条件格式规则并恢复规则，即使用户可能手动格式化了其数据。

translated by 谷歌翻译

Neurosymbolic Repair for Low-Code Formula Languages

Rohan Bavishi , Harshit Joshi , José Pablo Cambronero Sánchez , Anna Fariha , Sumit Gulwani , Vu Le , Ivan Radicek , Ashish Tiwari

分类：人工智能

2022-07-24

大多数低编码平台的用户，例如Excel和PowerApps，都以特定于域的公式语言编写程序来执行非平凡的任务。用户通常可以编写他们想要的大部分程序，但是引入了一些小错误，这些错误会产生破损的公式。这些错误既可以是句法和语义，也很难让低代码用户识别和修复，即使只能通过一些编辑解决。我们正式化了产生最后一英里维修问题等编辑的问题。为了解决这个问题，我们开发了Lamirage，这是一种最后一英里的维修发动机发电机，结合了符号和神经技术，以低代码公式语言进行最后一英里维修。 Lamirage采用语法和一组特定领域的约束/规则，它们共同近似目标语言，并使用它们来生成可以用该语言修复公式的维修引擎。为了应对本地化错误和对候选维修进行排名的挑战，Lamirage利用神经技术，而它依赖于符号方法来生成候选维修。这种组合使Lamirage可以找到满足提供的语法和约束的维修，然后选择最自然的修复。我们将Lamirage与400个Real Excel和PowerFX公式的最新神经和符号方法进行了比较，其中Lamirage的表现优于所有基线。我们释放这些基准，以鼓励在低代码域中进行后续工作。

translated by 谷歌翻译

Ontology-based Context Aware Recommender System Application for Tourism

Vitor T. Camacho , José Cruz

分类：机器学习

2022-12-29

In this work a novel recommender system (RS) for Tourism is presented. The RS is context aware as is now the rule in the state-of-the-art for recommender systems and works on top of a tourism ontology which is used to group the different items being offered. The presented RS mixes different types of recommenders creating an ensemble which changes on the basis of the RS's maturity. Starting from simple content-based recommendations and iteratively adding popularity, demographic and collaborative filtering methods as rating density and user cardinality increases. The result is a RS that mutates during its lifetime and uses a tourism ontology and natural language processing (NLP) to correctly bin the items to specific item categories and meta categories in the ontology. This item classification facilitates the association between user preferences and items, as well as allowing to better classify and group the items being offered, which in turn is particularly useful for context-aware filtering.

translated by 谷歌翻译

Anomaly detection in laser-guided vehicles' batteries: a case study

Gianfranco Lombardo , Stefano Cagnoni , Stefano Cavalli , Juan José Contreras Gonzáles , Francesco Monica , Monica Mordonini , Michele Tomaiuolo

分类：机器学习

2022-12-27

Detecting anomalous data within time series is a very relevant task in pattern recognition and machine learning, with many possible applications that range from disease prevention in medicine, e.g., detecting early alterations of the health status before it can clearly be defined as "illness" up to monitoring industrial plants. Regarding this latter application, detecting anomalies in an industrial plant's status firstly prevents serious damages that would require a long interruption of the production process. Secondly, it permits optimal scheduling of maintenance interventions by limiting them to urgent situations. At the same time, they typically follow a fixed prudential schedule according to which components are substituted well before the end of their expected lifetime. This paper describes a case study regarding the monitoring of the status of Laser-guided Vehicles (LGVs) batteries, on which we worked as our contribution to project SUPER (Supercomputing Unified Platform, Emilia Romagna) aimed at establishing and demonstrating a regional High-Performance Computing platform that is going to represent the main Italian supercomputing environment for both computing power and data volume.

translated by 谷歌翻译

Scaling Painting Style Transfer

Bruno Galerne , Lara Raad , José Lezama , Jean-Michel Morel

分类：计算机视觉

2022-12-27

Neural style transfer is a deep learning technique that produces an unprecedentedly rich style transfer from a style image to a content image and is particularly impressive when it comes to transferring style from a painting to an image. It was originally achieved by solving an optimization problem to match the global style statistics of the style image while preserving the local geometric features of the content image. The two main drawbacks of this original approach is that it is computationally expensive and that the resolution of the output images is limited by high GPU memory requirements. Many solutions have been proposed to both accelerate neural style transfer and increase its resolution, but they all compromise the quality of the produced images. Indeed, transferring the style of a painting is a complex task involving features at different scales, from the color palette and compositional style to the fine brushstrokes and texture of the canvas. This paper provides a solution to solve the original global optimization for ultra-high resolution images, enabling multiscale style transfer at unprecedented image sizes. This is achieved by spatially localizing the computation of each forward and backward passes through the VGG network. Extensive qualitative and quantitative comparisons show that our method produces a style transfer of unmatched quality for such high resolution painting styles.

translated by 谷歌翻译

Structure-based drug discovery with deep learning

Rıza Özçelik , Derek van Tilborg , José Jiménez-Luna , Francesca Grisoni

分类：机器学习

2022-12-26

Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, $\textit{e.g.}$, to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules $\textit{de novo}$. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a $\textit{renaissance}$ in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead.

translated by 谷歌翻译

Forecasting through deep learning and modal decomposition in multi-phase concentric jets

León Mata , Rodrigo Abadía-Heredia , Manuel Lopez-Martin , José M. Pérez , Soledad Le Clainche

分类：机器学习

2022-12-24

This work presents a set of neural network (NN) models specifically designed for accurate and efficient fluid dynamics forecasting. In this work, we show how neural networks training can be improved by reducing data complexity through a modal decomposition technique called higher order dynamic mode decomposition (HODMD), which identifies the main structures inside flow dynamics and reconstructs the original flow using only these main structures. This reconstruction has the same number of samples and spatial dimension as the original flow, but with a less complex dynamics and preserving its main features. We also show the low computational cost required by the proposed NN models, both in their training and inference phases. The core idea of this work is to test the limits of applicability of deep learning models to data forecasting in complex fluid dynamics problems. Generalization capabilities of the models are demonstrated by using the same neural network architectures to forecast the future dynamics of four different multi-phase flows. Data sets used to train and test these deep learning models come from Direct Numerical Simulations (DNS) of these flows.

translated by 谷歌翻译

RouteNet-Fermi: Network Modeling with Graph Neural Networks

Miquel Ferriol-Galmés , Jordi Paillisse , José Suárez-Varela , Krzysztof Rusek , Shihan Xiao , Xiang Shi , Xiangle Cheng , Pere Barlet-Ros , Albert Cabellos-Aparicio

分类：人工智能 | 机器学习

2022-12-22

Network models are an essential block of modern networks. For example, they are widely used in network planning and optimization. However, as networks increase in scale and complexity, some models present limitations, such as the assumption of markovian traffic in queuing theory models, or the high computational cost of network simulators. Recent advances in machine learning, such as Graph Neural Networks (GNN), are enabling a new generation of network models that are data-driven and can learn complex non-linear behaviors. In this paper, we present RouteNet-Fermi, a custom GNN model that shares the same goals as queuing theory, while being considerably more accurate in the presence of realistic traffic models. The proposed model predicts accurately the delay, jitter, and loss in networks. We have tested RouteNet-Fermi in networks of increasing size (up to 300 nodes), including samples with mixed traffic profiles -- e.g., with complex non-markovian models -- and arbitrary routing and queue scheduling configurations. Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators and it is able to accurately scale to large networks. For example, the model produces delay estimates with a mean relative error of 6.24% when applied to a test dataset with 1,000 samples, including network topologies one order of magnitude larger than those seen during training.

translated by 谷歌翻译

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Ioannis Tsiamas , José A. R. Fonollosa , Marta R. Costa-jussà

分类：自然语言处理

2022-12-19

Data scarcity is one of the main issues with the end-to-end approach for Speech Translation, as compared to the cascaded one. Although most data resources for Speech Translation are originally document-level, they offer a sentence-level view, which can be directly used during training. But this sentence-level view is single and static, potentially limiting the utility of the data. Our proposed data augmentation method SegAugment challenges this idea and aims to increase data availability by providing multiple alternative sentence-level views of a dataset. Our method heavily relies on an Audio Segmentation system to re-segment the speech of each document, after which we obtain the target text with alignment methods. The Audio Segmentation system can be parameterized with different length constraints, thus giving us access to multiple and diverse sentence-level views for each document. Experiments in MuST-C show consistent gains across 8 language pairs, with an average increase of 2.2 BLEU points, and up to 4.7 BLEU for lower-resource scenarios in mTEDx. Additionally, we find that SegAugment is also applicable to purely sentence-level data, as in CoVoST, and that it enables Speech Translation models to completely close the gap between the gold and automatic segmentation at inference time.

translated by 谷歌翻译